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Abstract: The aim of this paper is to prove an improved version of the bounded 
• differences inequahty for matrix valued functions (see Tropp (2012), CoroUary 7.5), by 

^ I developing the methods of Mackey et al. (2012). Along the way, we prove new trace 

inequalities for the matrix exponential. 
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. 1. Introduction 

^ . 

Let Zi, . . . , Znhe independent (or dependent) random variables, and X = f{Zi, . . . , Zn) be 
a random Hermitian matrix. One specific example is when X = Ylk -^k is a sum of random 
CN , matrices. In many situations, we are interested in bounding the quantity F{XmaxiX) > t). 

Ahlswede and Winter (2002) was the first to use Laplace transform method in this setting, 
they show that for any random Hermitian matrix X, 



O 

^- : P(A™a.(X) >t)< inf {e-^*Etrexp(^X)} , (LI) 

: thus for X = Zk Xk, 



n^maAX) >t)< inf |e"^*Etrexp (^^J2^>^ | ' (l'^) 

Estimating the right hand side now poses a difficulty, because in general, e^~^^ ^ ■ for 
the matrix exponential. 

Tropp (2012) proves the following lemma to estimate the right hand side: 

Lemma 1.1 (Lemma 3.4 of Tropp (2012)). Consider a finite sequence {Xk\ of independent, 
random, self-adjoint matrices. Then 

Etrexp (^OX,^ < trexp ^^logEe^^* j for 9 eR. 

This Lemma is based on a corollary of Tropp (2012) (which is derived from a Theorem of 
Lieb): 
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Corollary 1.1 (Corollary 3.3 of Tropp (2012)). Let H he a fixed self-adjoint matrix, and 
let X he a random self-adjoint matrix. Then 

Etrexp(// + X) < tr exp [H + log (Ee^)) . 

These inequalities are used in Tropp (2012) to prove matrix versions of various concen- 
tration inequalities for sums of random matrices (Chernoff, Bernstein), and inequalities for 
matrix martingales (Azuma-Hoeffding, and matrix bounded differences). 

Mackey et al. (2012) takes a different approach. They make the following basic definition(Mackey et al. 
(2012), Definition 2.2): 

Definition 1 (Matrix Stein Pair). Let {Z,Z') he an exchangeahle pair of random variahles 
taking values in a polish space Z, and let : Z he a measurahle function. Define the 

random Hermitian matrices 

X := ^(Z) and X' := ^(Z'). 

We say that {X, X') is a matrix Stein pair if there is a constant a G (0, 1] for which 

E,{X — X'\Z) = aX almost surely. (1-3) 

The constant a is called the scale factor of the pair. When discussing a matrix Stein pair 
{X,X ), we always assume that E||X|p < oo. 

Suppose that {X, X') is a matrix Stein pair, then they write the derivate moment gener- 
ating function of m{X) as 




using exchangeability in the last step. 

To further bound this quantity, they prove the following trace inequality (Lemma 3.4 of 
Mackey et al. (2012)): 

Lemma 1.2. Let I he an interval of the real line. Suppose that g : I ^ W is a weakly 
increasing function and that h : I ^ is a function whose derivative h' is convex. For all 
matrices A, B E EI"'(/), 

tT[{g{A) - g{B)) ■ {h{A) - h{B)] 

< i tr MA) - g{B)){A - B) ■ {h'{A) + h\B))] . 

When h' is concave, the inequality is reversed. The same results hold for the standard trace. 

This lemma is based on a standard trace inequality (Petz (1994), Proposition 3). 
Corollary 1.2. For 6 > 0, 

tr ((X - X') (e'"" - e^^')) < ^ tr ((X - X'f (e^^ + e^^')) 
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Using this corollary, we can bound the derivate of the trace mgf: 
m{e)' < Etr(^^(X-X')'(e^^ + e''^')^ 



= Etr f-(X-X')V^ 

and this quantity can be bounded in many situations. 

The advantage of this approach compared to Tropp (2012) is that the constants are often 
better, and some dependent cases can be also treated. The disadvantage is that other than 
sums of random matrices, few other cases can be written as Stein pairs. This means that 
matrix martingales, and the method of bounded differences, are not possible to recover. 

The purpose of this paper is to show that Mackey et al. (2012) can be improved to show 
the method of bounded differences for matrix valued functions. We are going to prove new 
trace inequalities, which generalize Corollary 1.2, and allow us to go beyond Stein pairs. 

Our inequality also works for weakly dependent random variables. We quantify the de- 
pendence by a matrix: 

Definition (Dobrushin's interdependence matrix). Let X := (Xi,...,X„) be a random 
vector taking values in A := (Ai, . . . , A„), with law fi. Suppose that D := {dij)i<ij<n is an 
n X n matrix with nonnegative entries and zeroes on the diagonal such that for any i, and 
any x,y E A, 

n 

dTv{^J'^{■\X-i),^i^{■\y^i)) < ^d^jl[Xj yj] 

i=i 

where dxv is the total variational distance. Then we say that D is a Dobrushin inter- 
dependence matrix for the random vector X (or equivalently random measure fi). Here 
x_i := (xi, . . . , Xj-i, . . . , x„) and yUj(-|x_j) is the conditional distribution of Xi given 

X_{ ^ — 2 • 

Concentration inequalities for real valued functions Hamming Lipschitz functions under 
the condition ||-D||2 < 1 have been proven in Chatterjee (2005), Chapter 4. 

2. Results 

The following result is a strengthening of Corollary 7.5 of Tropp (2012). We have expo- 
nent — t^/cr^ instead of — t^/So"^ in the independent case, and our result also works under 
Dobrushin-type weak dependence. 

Theorem 2.1. Let {Z^. : k = 1, . . . ,n} be an independent family of random variables, and 
let H be a function that maps n variables to a self adjoint matrix of dimension d. Consider 
a sequence {A^} of fixed self-adjoint matrices that satisfy 

{H{Zi, ...,Zk,...,Zn)- H{zi, . . . , 4, . . . , Zn)f < (2.1) 

where Zi and z\ range over all possible values of Zi for each index i. Compute the variance 
parameter 



a 
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Then for allt>0, 

P {A^,, {H{Z) - EH{Z)) >t}<d- e-'"l'^" (2.3) 

•where Z = [Zi, . . . , 

Alternatively, suppose that {Zk : k = 1, . . . ,n} is a family of dependent random variables 

with Dohrushin interdependence matrix D. If D satisfies max(| |D| |i, | |D| |oo) < 1, then for 
every t > 0, 

P {H{Z) - EH{Z)) >t}<d- e-'"l^'"'"\ (2.4) 

with 

^_ 1/(1 -||D||i) + 1/(1 -||D|U) ^ ^2.5) 

A simple corollary of this the following matrix Hoeffding bound (in the independent case, 
similar to Corollary 4.2 of Mackey et al. (2012)): 

Corollary 2.1. Let {Y k : /c = 1, . . . , ra} he an independent family ofW^ matrices, and let 
H be a function that maps n variables to a self adjoint matrix of dimension d. Consider a 
sequence {Ak} of fixed self-adjoint matrices, 

Er, = 0, Yl^ Al (2.6) 

Define the variance parameter 

k 



Then for all t > 0, 

P 



Alternatively, for {Yk : /c = 1, . . . ,n} weakly dependent with Dobrushin matrix D satisfying 
max(||D||i, ||D||oo) < 1, we. have 



^^\ma. {^^'^ >t^<d 



with c defined as in (2.5). 

Remark 2.1. The 4 in the exponent comes from the fact that (2.1) is satisfied for 2Ak. 

An important tool in the proof is the following trace inequality: 
Theorem 2.2. Let A,B,C be Hermitian matrices of equal size, then 

.(C(e^-.«)),.(^Z!±(A^(fl±£!)). 
Corollary 2.2. Under the same conditions, for 6 > 0, 

..(c(e-^0)s«t.(^!±i4^(f!l±i!!)), 
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and for 6' < 0, 

Proof. Apply Theorem 2.2 to OA, OB, OC. □ 

We also prove this result: 

Theorem 2.3. (Matrix Holder inequality) Let A, 5, C, D 6e Hermitian matrices with A and 
B positive semidefinite, and < p < 1, then we have 

Re (tr {CAPDB^-P + CA^-^DBp)) < tr i ^ {A + B)\. (2.7) 

3. Proof of the bounded differences inequality 

For a random matrix X, the normalized trace mgf is defined, similarly to Definition 3.2 of 
Mackey et al. (20 12), as 

m{e) := mx{0) = Etee^^ = ^Etre^^, 

which may not exists for all values of 6. 

We are going to use Proposition 3.3 of Mackey et al. (2012) 

Proposition 3.1. (Matrix Laplace Transform Method) Let X be a random matrix with 
normalized trace mgf m{6) := Etre^''^. For each t G M, 

P{A„a^(X) > t} < d- inf exp{-^t + logm(^)}, and (3.1) 
F{Xmin{X) <t} <d - inf exp{-et + logm(^)}. (3.2) 

Proof of Theorem 2.1. We follow the Markov chain approach Chatterjee (2005). 

As shown in Chapter 4, an exchangeable pair (X, X') automatically defines a reversible 
Markov kernel P as 

Pf{X) :=E(/(X')|X = x), 

where / is any function with E|/(X)| < oo. Suppose that X takes values in a Polish space 
then 

Lemma 3.1 (Lemma 4.1 of Chatterjee (2005)). Suppose that f : Q is a measurable 

function with E/(X) = 0, and there is a finite constant L such that 

oo 

\P''fix) - P'^f{y)\ < L for every x and y, (3.3) 

k=0 

then 

oo 

F{x,y):=J2{P'f{x)-P'fiy)) (3.4) 

fe=0 

satisfies F{X,X') = -F{X',X) and E{F{X,X')\X) = f{X). 
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With a simple adaptation of the proof, the reader can verify that this Lemma also holds 
for matrix valued functions f : Q ^ H*^, with (3.3) replaced by 

oo 

||P''/(x) - P''f{y)\ \ < L for every x and y. (3.5) 

k=0 

We need to define property P as in Chatterjee (2005): 

Definition. Let {X{k)}k>o (^i^'d {^'(^)}fc>o be two chains from the kernel defined by (X, X'), 
for arbitrary initial values x,y G Q. We say that a coupling of these two chains satisfies 
property P if for every x,y & Q, and every k, the marginal distribution of X{k) only depends 
on X, and the marginal distribution of X'{k) only depends on y. 

We propose the following matrix version Lemma 4.2 of Chatterjee (2005) (the proof can 
be easily adapted): 

Lemma 3.2. Suppose that a coupling of {X(A;)}fc>o and {X'{k)}k>o satisfies property P. 
Let f : Q ^ M.'^ be a function such that E/(X) = 0. Suppose that there exists L < oo such 
that for every x,y E Q, 

oo 

^ ||E(/(X(fc)) - /(X'(fc))|X(0) = x,X'(0) = y)\\ < L. (3.6) 

k=0 

Then, the function F defined as 

oo 

F{x, y):=J2E {f{X{k)) - /(X'(fc)) |X(0) = x, X'(0) = y) (3.7) 

k=0 

satisfies F(X,X') = -F(X',X) andE{F{X, X')\X) = /(X). 

First, we will prove the independent case: 

Proof of independent case. Let X := (Xi, . . . , X^) be a vector with independent components 
{Xf component i, {X{k)}k>Q: Markov chain). 
Let 

XM := (xf\...,XM) (3.8) 

be independent copies of X, for r > 0. Let /, /i, . . . , Jfc . . . be uniformly distributed indexes 
in [n], independent of each other and of X and X^^^. Define X' as 

X'- = Xi for i ^ / and X'j = X^ . 

Now we are ready to construct X{k) and X'{k): 

Suppose that X(0) = x and X'(0) = y, for x,y eQ. For k >1, define X{k) as 

Xi{k) := Xi{k - 1) for z 7^ 4 and Xi^{k) := xjf . 
Similarly, for A; > 1, define X'{k) as 

X'.{k) := X'-{k - 1) for i ^ 4 and X'j^{k) := Xi^{k). 
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With this definition, {X{k)}k>o a-nd {-^'(^)}fc>o are having the same distribution as the 
Markov chain defined by the kernel Pf{X), moreover {X{k), X'{k)) satisfy property P. In 
practice, we will start with Xq = X and Xq = X'. 

We can prove condition (3.6) by the coupon collector's problem. 

For this chain, we can write 

m{e)' = Etr (/(X)e^^(^)) = Etr {F{X,X') ■ e^^^) 

1 / oo 

= ^Etr {f{X{k)) - fiX'ik))) . (e^/(^) - e^/(^') ' 

\fc=0 

^ oo 

^ J^Etr ^ J„ . . . , 4] ifiXik)) - f{X'{k))) . (e^/W - e^/(^'))) 



2 

A;=0 



Now, using Theorem 2.2, and the fact that {f{X{k)) - f{X\k))f < A] and (/(X) - 



m{9y < ^f2EtTfl[I^h,...,h]A'j(- 

1,—n \ V 



fc=0 

9 ^—^ n \ n 



2 n \ n 
< iff2em(#), 



9m{9) 



so 



log(m(e)) < ^9'a\ 



thus by Proposition 3.1, 

F{Xrn.axiH{Z)) > t} < ■ iuf CXp | -^t + v4 < ■ exp (-^ 



□ 



Now we prove the general case: 

Proof for Dohrushin condition. Let X , X' , X[k) , X' [k) be defined analogously to the way it 
is done in the proof of Theorem 4.3 of Chatterjee (2005): X' is defined by choosing / uniformly 
in [n], and then resampling Xi conditioned on the rest (Gibbs sampler), while X(A;),X'(/c) 
are defined by choosing uniformly in [ra], resampling Xi^{k — 1) and X'j^{k — 1) in the 
greedy coupling way, i.e. Xj^{k) is resampled conditionally on the rest of X{k — 1), Xj^{k) is 
resampled conditionally on the rest of X'{k — 1), and at the same time, these two conditional 
distributions are coupled in the maximal coupling (see Lindvall (1992)). Property P can be 
proven by induction, verifying (3.6) is left to the reader as exercise. 
We can write f{X{k)) — f{X'{k)) as a telescopic sum: 

n 

f{X{k)) - f{X\k)) = 5^ / (Xi(fc), . . . , X,{k),X[^,{k), . . . , X'^k)) 

i=l 

n 

-f {X,{k), . . . , X,.,{k),X:{k), X(fc)) =: Uk). 



i=l 
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Etr {f{X)e'f^''^) = Etr (F(X, X')e^^W) 
^Etr (f(X,XO (e^/W-e^^(^'))) 

k=0 

, oo n 

fc=0 i=l 

and obviously Zi{k) = Li{k)Zi{k), so by Theorem 2.2, we have 
^Etr (L,(A;)Z.(A:)-(e^^W-e^/(^'))) 

< ^Etr ((Z.(fc))^ + (/(X) - /(X'))^) ■ (e^^^^) + e^^^^'))) 

< ^Etr (^L,{k)e {Aj + A^) ■ (e^/(^) + e^^^^'))) 

< ^Etr {k{k)e {Aj + A^) ■ e'f^''^) . 

Let D be the Dobrushin dependence matrix of Xi, . . . , X„, let us denote B := (^1 — E + 
-D, with E being the n x n identity matrix. Let Li{k) := l[Xi{k) ^ X-(A;)], and let 
li{k) := E(Li(A;)|X,X'). Page 77-78 of Chatterjee (2005) proves that l{k) < B''e{I), with 
e(/) denoting the vector whose Jth coordinate is 1 and the rest is 0. 

n 

J2^Etr{k{k) {Aj + A^) . ee'f^^^) 
1=1 

<X:^Etr([5^e(/)]. (A? + 4^).^e^/W) 

i=l 
n n ^ ^ 

i=i j=i ^ 

j = l \i=l J 



We have 

m'{e) = 
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Now 



Y,[B'eU% = \\B'eU)\\i<{\\B\\if. and 



i=l 



so 



oo j 



Summing up in /c, and noticing that ||-B||i<l — ^ + ;i;||Z^||i gives 

and thus we get the result by Proposition 3.1, as in the independent case. □ 

□ 

4. Proof of trace inequalities 

Before starting the proof, we state a few simple inequalities: 

• For any P, Q € M'', we have 

PQ + Q*P* ^PP* + Q*Q, (4.1) 

which follows from (P + Q*){P* + Q) ^ 0. 

• Also, we can easily prove that if P, Q,R, S & W^, then 

Ue(t.iPQRS))<u(^-£l±^M±n) (4.2) 



To prove this, just apply (4.1) to {PQ){RS) and to {QR){SP), and rearrange the 
terms. 

Proof of Theorem 2.2. First notice that adding a constant times identity matrix to A and 
B multiplies both sides by the same number. Therefore we can suppose without loss of 
generality that A,ByO. 

By taking Taylor expansion, the inequality becomes 



^A'^-B"] ( C + {A - Bf ^ A^-^ + B^-' 



tr 



To show this, we will prove that the inequality holds for each term in the sums, i.e. we claim: 
tr (C(A'= - B")) <k-ti (^91±(AZ^ (A^-i + B"-')^ (4.3) 
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Now 

-B^ = - A^-^B + A^-^B - A^^'^B'^ + ... + AB^'^ - 5^ 

The terms in the sum are of the form A^B^^^ — ^'-i^'s-'+i = A^~^{A — B)B''~K We claim 
the following about two 'symmetric' terms from such a sum (which clearly implies (4.3)): 

Lemma 4.1. If A,B,C are Hermitian matrices with A,B positive definite, and < k <n 
are integers, then 

Re (tr {C (A^(A - 5)5"-'= + A"-\A - B)B''))) (4.4) 

Proof. Let us denote D := A — B., then the inequality becomes 

Re (tr [C [A^DB'^-'' + A^'-^DB^) ) ) < tr f ^ (A" + B") \ (4.5) 

If k = n/2, then this follows from (4.2). Suppose, without loss of generality, that k < n/2. 
Now we can get rid of C in the following way: 

Re (tr [CA'^DB"-'' + CA''~^DB^)) = 

Re (tr ((5"/2^)(A^-Z}5"/2-^) + (CA"/2)(AV2-fc^5fc))) < 

< -Re (tr (C^E" + A'^''DB''~^^D + C^A"" + A^'-'^^DB'^'^D)) 

The terms involving C are the same as on the right hand side of (4.4), so we need to prove 
that 

tr (A'^^DB^'^^'^D + A''-^''DB^''D) < tr {D^ (A" + 5")) . (4.6) 

Both sides are real so we did not write the real part. 
Let us denote, for < / < n, 

R := tr {D^ (A" + B")) , and 
Ti := tr {A^DB''-^D + A^'^DB^D) . 

We can show that when / = n/2, then Ti < R holds. Let us denote the maximum of Ti as 
T := maxi<i<„T;, and suppose that the maximum is taken at /q < n/2, then 

tr {A^"DB''-^"D + A'^-^WB^W) 

= tr ((A'°D5"/2-Zo)(5n/2^^ ^ (A"/=^D)(5'«DA"/2"'°)) , 

so as previously, using (4.1) we can show that T/^, < |72/q + ^R, i.e. T < + and thus 
T <R. 

For Iq > n/2, we can show that T/^ < |T2„_2io + \R. □ 

□ 

Proof of Theorem 2.3. (2.7) is a generalization of (4.5). If p is rational of the form |, then 
we can proceed the same way as in the proof of Lemma 4.1, with Ab and Bb taking the role 
of A and B. If p is irrational, we can write it as the limit of rationals, and use continuity to 
get the result. □ 
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5. Open problems 

Based on extensive numerical evidence and on some theoretical results, we conjecture the 
following trace inequalities: 

1. Let A,B,C e tf, then 

tr(C(e^-e^)) (5.1) 



2. We expect (5.1) to generalize to any monotone increasing convex function /, i.e. we 
expect that in such situations, 

tv{C(f{A)-f{B))) (5.2) 



3. (5.1) would imply concentration for self-bounding matrix valued functions (in the sense 
of Boucheron, Lugosi and Massart (2009)). A similar setting has been already studied 
in Mackey (2012), Theorem 25, however, this theorem requires a very strong self- 
reciprocity condition, which may not be satisfied in general. 
We define matrix self-bounding functions as follows: 

Definition 2. An valued function i/(Zi, . . . , Z„) is said to be {a,b) matrix self- 
bounding, if for any Z[, . . . , Z'^, 

(a) H{Z) - H{Zi, ...,Z'„...,Zn) < Id, and 

(b) EtiiH{Z) - H{Z,, ...,Zl,..., Z„))+ ^ aH{Z) + bl,. 

An valued function H{Zi, . . . , Z„) is said to be weakly (a, b) matrix self-bounding, 
if for any Z[, . . . , Z'^, 

n 

J2iH{Z) - H{Z^, . . . , . . . , ^„,))'+ ^ aH{Z) + 6/,. 

i=l 

We expect concentration inequalities similar to Theorem 1 of Boucheron, Lugosi and Massart 
(2009) to hold for such functions. 
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